-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Don't identify decimals as strings. #7710
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-0.19 #7710 +/- ##
===============================================
+ Coverage 81.86% 82.51% +0.64%
===============================================
Files 101 101
Lines 16884 17450 +566
===============================================
+ Hits 13822 14398 +576
+ Misses 3062 3052 -10
Continue to review full report at Codecov.
|
Looks like this is a slightly more subtle problem. There's more to be done here to actually enable decimal operations; the way that they're currently be rejected is problematic (what I fixed above), but they are currently being rejected intentionally and I'll have to do a bit more work to actually enable them. |
@gpucibot merge |
As documented in this pandas issue,
is_string_type
for pandas is not strict and will characterize a whole bunch of things as strings that aren't. For our purposes, this is problematic because basically all subclasses ofExtensionDType
will be classified as strings by that function. This is definitely not appropriate, so I modified our version ofis_string_dtype
to explicitly reject all of our extension dtypes (previously it was only excluding categorical types). I'm not 100% confident that no other parts of the code base rely on the current (erroneous) behavior, but the cudf tests all passed for me locally and my attempt to trace all calls ofutils.is_string_dtype
all look to be places where the change gives more correct behavior, so I think our best bet is to just move forward with this change. Any problems that result from this change in the future due to other code relying on the current behavior should probably be characterized as bugs in the calling code and fixed there. The same goes for for external codes that relied on this behavior; this change is potentially breaking for them as well, but again is something that they should be addressing.